Overview

Dataset statistics

Number of variables37
Number of observations700
Missing cells700
Missing cells (%)2.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory202.5 KiB
Average record size in memory296.2 B

Variable types

Categorical25
Numeric11
Unsupported1

Warnings

Customer_ID has a high cardinality: 700 distinct values High cardinality
policy_bind_date has a high cardinality: 671 distinct values High cardinality
incident_location has a high cardinality: 700 distinct values High cardinality
incident_date has a high cardinality: 60 distinct values High cardinality
months_as_customer is highly correlated with ageHigh correlation
age is highly correlated with months_as_customerHigh correlation
auto_model is highly correlated with auto_makeHigh correlation
auto_make is highly correlated with auto_modelHigh correlation
_c39 has 700 (100.0%) missing values Missing
Customer_ID is uniformly distributed Uniform
policy_bind_date is uniformly distributed Uniform
incident_location is uniformly distributed Uniform
Customer_ID has unique values Unique
policy_number has unique values Unique
incident_location has unique values Unique
_c39 is an unsupported type, check if it needs cleaning or further analysis Unsupported
capital-gains has 350 (50.0%) zeros Zeros
capital-loss has 326 (46.6%) zeros Zeros
incident_hour_of_the_day has 40 (5.7%) zeros Zeros
umbrella_limit has 561 (80.1%) zeros Zeros

Reproduction

Analysis started2021-04-19 12:03:44.240394
Analysis finished2021-04-19 12:04:16.246564
Duration32.01 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

Customer_ID
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct700
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
Customer_352
 
1
Customer_401
 
1
Customer_898
 
1
Customer_806
 
1
Customer_951
 
1
Other values (695)
695 

Length

Max length12
Median length12
Mean length11.9
Min length10

Characters and Unicode

Total characters8330
Distinct characters19
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique700 ?
Unique (%)100.0%

Sample

1st rowCustomer_541
2nd rowCustomer_440
3rd rowCustomer_482
4th rowCustomer_422
5th rowCustomer_778
ValueCountFrequency (%)
Customer_3521
 
0.1%
Customer_4011
 
0.1%
Customer_8981
 
0.1%
Customer_8061
 
0.1%
Customer_9511
 
0.1%
Customer_9271
 
0.1%
Customer_2791
 
0.1%
Customer_4201
 
0.1%
Customer_581
 
0.1%
Customer_6141
 
0.1%
Other values (690)690
98.6%
2021-04-19T12:04:17.192759image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
customer_1771
 
0.1%
customer_461
 
0.1%
customer_7351
 
0.1%
customer_2221
 
0.1%
customer_2681
 
0.1%
customer_1541
 
0.1%
customer_2571
 
0.1%
customer_9761
 
0.1%
customer_1401
 
0.1%
customer_6051
 
0.1%
Other values (690)690
98.6%

Most occurring characters

ValueCountFrequency (%)
C700
 
8.4%
u700
 
8.4%
s700
 
8.4%
t700
 
8.4%
o700
 
8.4%
m700
 
8.4%
e700
 
8.4%
r700
 
8.4%
_700
 
8.4%
1219
 
2.6%
Other values (9)1811
21.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4900
58.8%
Decimal Number2030
24.4%
Uppercase Letter700
 
8.4%
Connector Punctuation700
 
8.4%

Most frequent character per category

ValueCountFrequency (%)
1219
10.8%
7219
10.8%
4216
10.6%
8214
10.5%
5211
10.4%
3210
10.3%
2208
10.2%
6208
10.2%
9186
9.2%
0139
6.8%
ValueCountFrequency (%)
u700
14.3%
s700
14.3%
t700
14.3%
o700
14.3%
m700
14.3%
e700
14.3%
r700
14.3%
ValueCountFrequency (%)
C700
100.0%
ValueCountFrequency (%)
_700
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5600
67.2%
Common2730
32.8%

Most frequent character per script

ValueCountFrequency (%)
_700
25.6%
1219
 
8.0%
7219
 
8.0%
4216
 
7.9%
8214
 
7.8%
5211
 
7.7%
3210
 
7.7%
2208
 
7.6%
6208
 
7.6%
9186
 
6.8%
ValueCountFrequency (%)
C700
12.5%
u700
12.5%
s700
12.5%
t700
12.5%
o700
12.5%
m700
12.5%
e700
12.5%
r700
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII8330
100.0%

Most frequent character per block

ValueCountFrequency (%)
C700
 
8.4%
u700
 
8.4%
s700
 
8.4%
t700
 
8.4%
o700
 
8.4%
m700
 
8.4%
e700
 
8.4%
r700
 
8.4%
_700
 
8.4%
1219
 
2.6%
Other values (9)1811
21.7%

months_as_customer
Real number (ℝ≥0)

HIGH CORRELATION

Distinct346
Distinct (%)49.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean209.5285714
Minimum0
Maximum479
Zeros1
Zeros (%)0.1%
Memory size5.6 KiB
2021-04-19T12:04:17.591128image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile32
Q1123
median209
Q3283
95-th percentile434.05
Maximum479
Range479
Interquartile range (IQR)160

Descriptive statistics

Standard deviation114.746174
Coefficient of variation (CV)0.5476397477
Kurtosis-0.5107224877
Mean209.5285714
Median Absolute Deviation (MAD)80.5
Skewness0.3307605275
Sum146670
Variance13166.68445
MonotocityNot monotonic
2021-04-19T12:04:18.028959image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2306
 
0.9%
2956
 
0.9%
2226
 
0.9%
2456
 
0.9%
2906
 
0.9%
2855
 
0.7%
1635
 
0.7%
1435
 
0.7%
1285
 
0.7%
1265
 
0.7%
Other values (336)645
92.1%
ValueCountFrequency (%)
01
0.1%
11
0.1%
22
0.3%
31
0.1%
42
0.3%
ValueCountFrequency (%)
4791
0.1%
4781
0.1%
4761
0.1%
4751
0.1%
4731
0.1%

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct46
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.41714286
Minimum19
Maximum64
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-04-19T12:04:18.373746image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum19
5-th percentile26
Q132
median39
Q345
95-th percentile57
Maximum64
Range45
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.170472168
Coefficient of variation (CV)0.2326518744
Kurtosis-0.2994174588
Mean39.41714286
Median Absolute Deviation (MAD)6
Skewness0.4700257301
Sum27592
Variance84.09755978
MonotocityNot monotonic
2021-04-19T12:04:18.740695image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
3938
 
5.4%
4336
 
5.1%
3432
 
4.6%
3231
 
4.4%
3830
 
4.3%
4129
 
4.1%
3727
 
3.9%
3127
 
3.9%
4026
 
3.7%
3326
 
3.7%
Other values (36)398
56.9%
ValueCountFrequency (%)
191
 
0.1%
201
 
0.1%
213
0.4%
221
 
0.1%
233
0.4%
ValueCountFrequency (%)
641
 
0.1%
631
 
0.1%
623
 
0.4%
619
1.3%
608
1.1%

insured_sex
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
FEMALE
367 
MALE
333 

Length

Max length6
Median length6
Mean length5.048571429
Min length4

Characters and Unicode

Total characters3534
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFEMALE
2nd rowMALE
3rd rowMALE
4th rowMALE
5th rowMALE
ValueCountFrequency (%)
FEMALE367
52.4%
MALE333
47.6%
2021-04-19T12:04:19.311476image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:19.467548image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
female367
52.4%
male333
47.6%

Most occurring characters

ValueCountFrequency (%)
E1067
30.2%
M700
19.8%
A700
19.8%
L700
19.8%
F367
 
10.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3534
100.0%

Most frequent character per category

ValueCountFrequency (%)
E1067
30.2%
M700
19.8%
A700
19.8%
L700
19.8%
F367
 
10.4%

Most occurring scripts

ValueCountFrequency (%)
Latin3534
100.0%

Most frequent character per script

ValueCountFrequency (%)
E1067
30.2%
M700
19.8%
A700
19.8%
L700
19.8%
F367
 
10.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII3534
100.0%

Most frequent character per block

ValueCountFrequency (%)
E1067
30.2%
M700
19.8%
A700
19.8%
L700
19.8%
F367
 
10.4%
Distinct7
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
JD
117 
High School
115 
MD
108 
Associate
104 
College
90 
Other values (2)
166 

Length

Max length11
Median length7
Mean length5.912857143
Min length2

Characters and Unicode

Total characters4139
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJD
2nd rowMasters
3rd rowJD
4th rowHigh School
5th rowPhD
ValueCountFrequency (%)
JD117
16.7%
High School115
16.4%
MD108
15.4%
Associate104
14.9%
College90
12.9%
Masters90
12.9%
PhD76
10.9%
2021-04-19T12:04:19.803279image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:19.978442image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
jd117
14.4%
high115
14.1%
school115
14.1%
md108
13.3%
associate104
12.8%
college90
11.0%
masters90
11.0%
phd76
9.3%

Most occurring characters

ValueCountFrequency (%)
o424
 
10.2%
s388
 
9.4%
e374
 
9.0%
h306
 
7.4%
D301
 
7.3%
l295
 
7.1%
i219
 
5.3%
c219
 
5.3%
g205
 
5.0%
M198
 
4.8%
Other values (10)1210
29.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2908
70.3%
Uppercase Letter1116
 
27.0%
Space Separator115
 
2.8%

Most frequent character per category

ValueCountFrequency (%)
o424
14.6%
s388
13.3%
e374
12.9%
h306
10.5%
l295
10.1%
i219
7.5%
c219
7.5%
g205
7.0%
a194
6.7%
t194
6.7%
ValueCountFrequency (%)
D301
27.0%
M198
17.7%
J117
 
10.5%
H115
 
10.3%
S115
 
10.3%
A104
 
9.3%
C90
 
8.1%
P76
 
6.8%
ValueCountFrequency (%)
115
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4024
97.2%
Common115
 
2.8%

Most frequent character per script

ValueCountFrequency (%)
o424
 
10.5%
s388
 
9.6%
e374
 
9.3%
h306
 
7.6%
D301
 
7.5%
l295
 
7.3%
i219
 
5.4%
c219
 
5.4%
g205
 
5.1%
M198
 
4.9%
Other values (9)1095
27.2%
ValueCountFrequency (%)
115
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4139
100.0%

Most frequent character per block

ValueCountFrequency (%)
o424
 
10.2%
s388
 
9.4%
e374
 
9.0%
h306
 
7.4%
D301
 
7.3%
l295
 
7.1%
i219
 
5.3%
c219
 
5.3%
g205
 
5.0%
M198
 
4.8%
Other values (10)1210
29.2%
Distinct14
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
machine-op-inspct
72 
exec-managerial
57 
tech-support
56 
prof-specialty
55 
other-service
52 
Other values (9)
408 

Length

Max length17
Median length14
Mean length13.55714286
Min length5

Characters and Unicode

Total characters9490
Distinct characters21
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfarming-fishing
2nd rowprotective-serv
3rd rowhandlers-cleaners
4th rowhandlers-cleaners
5th rowpriv-house-serv
ValueCountFrequency (%)
machine-op-inspct72
10.3%
exec-managerial57
 
8.1%
tech-support56
 
8.0%
prof-specialty55
 
7.9%
other-service52
 
7.4%
craft-repair50
 
7.1%
sales50
 
7.1%
armed-forces49
 
7.0%
adm-clerical48
 
6.9%
protective-serv48
 
6.9%
Other values (4)163
23.3%
2021-04-19T12:04:20.505622image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
machine-op-inspct72
10.3%
exec-managerial57
 
8.1%
tech-support56
 
8.0%
prof-specialty55
 
7.9%
other-service52
 
7.4%
craft-repair50
 
7.1%
sales50
 
7.1%
armed-forces49
 
7.0%
adm-clerical48
 
6.9%
protective-serv48
 
6.9%
Other values (4)163
23.3%

Most occurring characters

ValueCountFrequency (%)
e1090
11.5%
r951
 
10.0%
-766
 
8.1%
a746
 
7.9%
s673
 
7.1%
i661
 
7.0%
c641
 
6.8%
p554
 
5.8%
t529
 
5.6%
o468
 
4.9%
Other values (11)2411
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8724
91.9%
Dash Punctuation766
 
8.1%

Most frequent character per category

ValueCountFrequency (%)
e1090
12.5%
r951
10.9%
a746
 
8.6%
s673
 
7.7%
i661
 
7.6%
c641
 
7.3%
p554
 
6.4%
t529
 
6.1%
o468
 
5.4%
n439
 
5.0%
Other values (10)1972
22.6%
ValueCountFrequency (%)
-766
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8724
91.9%
Common766
 
8.1%

Most frequent character per script

ValueCountFrequency (%)
e1090
12.5%
r951
10.9%
a746
 
8.6%
s673
 
7.7%
i661
 
7.6%
c641
 
7.3%
p554
 
6.4%
t529
 
6.1%
o468
 
5.4%
n439
 
5.0%
Other values (10)1972
22.6%
ValueCountFrequency (%)
-766
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9490
100.0%

Most frequent character per block

ValueCountFrequency (%)
e1090
11.5%
r951
 
10.0%
-766
 
8.1%
a746
 
7.9%
s673
 
7.1%
i661
 
7.0%
c641
 
6.8%
p554
 
5.8%
t529
 
5.6%
o468
 
4.9%
Other values (11)2411
25.4%

insured_hobbies
Categorical

Distinct20
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
camping
 
45
reading
 
44
exercise
 
43
hiking
 
41
yachting
 
40
Other values (15)
487 

Length

Max length14
Median length8
Mean length8.107142857
Min length4

Characters and Unicode

Total characters5675
Distinct characters24
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpaintball
2nd rowyachting
3rd rowgolf
4th rowhiking
5th rowexercise
ValueCountFrequency (%)
camping45
 
6.4%
reading44
 
6.3%
exercise43
 
6.1%
hiking41
 
5.9%
yachting40
 
5.7%
paintball38
 
5.4%
golf38
 
5.4%
bungie-jumping37
 
5.3%
kayaking37
 
5.3%
base-jumping37
 
5.3%
Other values (10)300
42.9%
2021-04-19T12:04:20.887281image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
camping45
 
6.4%
reading44
 
6.3%
exercise43
 
6.1%
hiking41
 
5.9%
yachting40
 
5.7%
paintball38
 
5.4%
golf38
 
5.4%
bungie-jumping37
 
5.3%
kayaking37
 
5.3%
base-jumping37
 
5.3%
Other values (10)300
42.9%

Most occurring characters

ValueCountFrequency (%)
i658
 
11.6%
g513
 
9.0%
a488
 
8.6%
e486
 
8.6%
n478
 
8.4%
s381
 
6.7%
o221
 
3.9%
m219
 
3.9%
c217
 
3.8%
p213
 
3.8%
Other values (14)1801
31.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5509
97.1%
Dash Punctuation166
 
2.9%

Most frequent character per category

ValueCountFrequency (%)
i658
11.9%
g513
 
9.3%
a488
 
8.9%
e486
 
8.8%
n478
 
8.7%
s381
 
6.9%
o221
 
4.0%
m219
 
4.0%
c217
 
3.9%
p213
 
3.9%
Other values (13)1635
29.7%
ValueCountFrequency (%)
-166
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5509
97.1%
Common166
 
2.9%

Most frequent character per script

ValueCountFrequency (%)
i658
11.9%
g513
 
9.3%
a488
 
8.9%
e486
 
8.8%
n478
 
8.7%
s381
 
6.9%
o221
 
4.0%
m219
 
4.0%
c217
 
3.9%
p213
 
3.9%
Other values (13)1635
29.7%
ValueCountFrequency (%)
-166
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII5675
100.0%

Most frequent character per block

ValueCountFrequency (%)
i658
 
11.6%
g513
 
9.0%
a488
 
8.6%
e486
 
8.6%
n478
 
8.4%
s381
 
6.7%
o221
 
3.9%
m219
 
3.9%
c217
 
3.8%
p213
 
3.8%
Other values (14)1801
31.7%
Distinct6
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
own-child
127 
other-relative
121 
not-in-family
119 
husband
116 
wife
116 

Length

Max length14
Median length9
Mean length9.384285714
Min length4

Characters and Unicode

Total characters6569
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowother-relative
2nd rownot-in-family
3rd rownot-in-family
4th rowhusband
5th rownot-in-family
ValueCountFrequency (%)
own-child127
18.1%
other-relative121
17.3%
not-in-family119
17.0%
husband116
16.6%
wife116
16.6%
unmarried101
14.4%
2021-04-19T12:04:21.394672image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:21.606284image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
own-child127
18.1%
other-relative121
17.3%
not-in-family119
17.0%
husband116
16.6%
wife116
16.6%
unmarried101
14.4%

Most occurring characters

ValueCountFrequency (%)
i703
 
10.7%
n582
 
8.9%
e580
 
8.8%
-486
 
7.4%
a457
 
7.0%
r444
 
6.8%
o367
 
5.6%
l367
 
5.6%
h364
 
5.5%
t361
 
5.5%
Other values (10)1858
28.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6083
92.6%
Dash Punctuation486
 
7.4%

Most frequent character per category

ValueCountFrequency (%)
i703
11.6%
n582
 
9.6%
e580
 
9.5%
a457
 
7.5%
r444
 
7.3%
o367
 
6.0%
l367
 
6.0%
h364
 
6.0%
t361
 
5.9%
d344
 
5.7%
Other values (9)1514
24.9%
ValueCountFrequency (%)
-486
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6083
92.6%
Common486
 
7.4%

Most frequent character per script

ValueCountFrequency (%)
i703
11.6%
n582
 
9.6%
e580
 
9.5%
a457
 
7.5%
r444
 
7.3%
o367
 
6.0%
l367
 
6.0%
h364
 
6.0%
t361
 
5.9%
d344
 
5.7%
Other values (9)1514
24.9%
ValueCountFrequency (%)
-486
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6569
100.0%

Most frequent character per block

ValueCountFrequency (%)
i703
 
10.7%
n582
 
8.9%
e580
 
8.8%
-486
 
7.4%
a457
 
7.0%
r444
 
6.8%
o367
 
5.6%
l367
 
5.6%
h364
 
5.5%
t361
 
5.5%
Other values (10)1858
28.3%

capital-gains
Real number (ℝ≥0)

ZEROS

Distinct269
Distinct (%)38.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25777.57143
Minimum0
Maximum98800
Zeros350
Zeros (%)50.0%
Memory size5.6 KiB
2021-04-19T12:04:21.849030image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median5000
Q352200
95-th percentile71205
Maximum98800
Range98800
Interquartile range (IQR)52200

Descriptive statistics

Standard deviation28239.30078
Coefficient of variation (CV)1.095498886
Kurtosis-1.334480834
Mean25777.57143
Median Absolute Deviation (MAD)5000
Skewness0.451444617
Sum18044300
Variance797458108.5
MonotocityNot monotonic
2021-04-19T12:04:22.173565image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0350
50.0%
685004
 
0.6%
463004
 
0.6%
293003
 
0.4%
526003
 
0.4%
457003
 
0.4%
631003
 
0.4%
511003
 
0.4%
758003
 
0.4%
636003
 
0.4%
Other values (259)321
45.9%
ValueCountFrequency (%)
0350
50.0%
100001
 
0.1%
110001
 
0.1%
121001
 
0.1%
128001
 
0.1%
ValueCountFrequency (%)
988001
0.1%
919001
0.1%
907001
0.1%
888001
0.1%
884001
0.1%

capital-loss
Real number (ℝ)

ZEROS

Distinct288
Distinct (%)41.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-27061
Minimum-111100
Maximum0
Zeros326
Zeros (%)46.6%
Memory size5.6 KiB
2021-04-19T12:04:22.481945image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum-111100
5-th percentile-71405
Q1-51825
median-27450
Q30
95-th percentile0
Maximum0
Range111100
Interquartile range (IQR)51825

Descriptive statistics

Standard deviation27874.24256
Coefficient of variation (CV)-1.030052199
Kurtosis-1.353650599
Mean-27061
Median Absolute Deviation (MAD)27450
Skewness-0.3479678515
Sum-18942700
Variance776973398.1
MonotocityNot monotonic
2021-04-19T12:04:22.880173image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0326
46.6%
-503004
 
0.6%
-537004
 
0.6%
-314004
 
0.6%
-538004
 
0.6%
-317004
 
0.6%
-492003
 
0.4%
-678003
 
0.4%
-510003
 
0.4%
-556003
 
0.4%
Other values (278)342
48.9%
ValueCountFrequency (%)
-1111001
0.1%
-914001
0.1%
-902001
0.1%
-894001
0.1%
-883001
0.1%
ValueCountFrequency (%)
0326
46.6%
-57001
 
0.1%
-63001
 
0.1%
-85001
 
0.1%
-106001
 
0.1%

policy_number
Real number (ℝ≥0)

UNIQUE

Distinct700
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean551898.9771
Minimum100804
Maximum998865
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-04-19T12:04:23.167535image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum100804
5-th percentile140921.75
Q1337547.25
median547773
Q3775554.5
95-th percentile964683.5
Maximum998865
Range898061
Interquartile range (IQR)438007.25

Descriptive statistics

Standard deviation260076.7729
Coefficient of variation (CV)0.4712398169
Kurtosis-1.147347575
Mean551898.9771
Median Absolute Deviation (MAD)215634
Skewness0.0230165729
Sum386329284
Variance6.763992781 × 1010
MonotocityNot monotonic
2021-04-19T12:04:23.438309image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2969601
 
0.1%
6202151
 
0.1%
8101891
 
0.1%
1394841
 
0.1%
1409771
 
0.1%
3263221
 
0.1%
4896181
 
0.1%
7430921
 
0.1%
6744851
 
0.1%
4195101
 
0.1%
Other values (690)690
98.6%
ValueCountFrequency (%)
1008041
0.1%
1014211
0.1%
1068731
0.1%
1071811
0.1%
1082701
0.1%
ValueCountFrequency (%)
9988651
0.1%
9981921
0.1%
9968501
0.1%
9962531
0.1%
9945381
0.1%

policy_bind_date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct671
Distinct (%)95.9%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
2006-01-01
 
3
1992-04-28
 
3
2007-05-06
 
2
2010-01-28
 
2
1997-07-14
 
2
Other values (666)
688 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters7000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique644 ?
Unique (%)92.0%

Sample

1st row2013-11-11
2nd row2005-12-09
3rd row2001-11-29
4th row2012-10-09
5th row2004-01-02
ValueCountFrequency (%)
2006-01-013
 
0.4%
1992-04-283
 
0.4%
2007-05-062
 
0.3%
2010-01-282
 
0.3%
1997-07-142
 
0.3%
2000-06-042
 
0.3%
1993-08-302
 
0.3%
2013-12-252
 
0.3%
1997-11-072
 
0.3%
1995-12-072
 
0.3%
Other values (661)678
96.9%
2021-04-19T12:04:24.003757image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2006-01-013
 
0.4%
1992-04-283
 
0.4%
2007-05-062
 
0.3%
2010-01-282
 
0.3%
1997-07-142
 
0.3%
2000-06-042
 
0.3%
1993-08-302
 
0.3%
2013-12-252
 
0.3%
1997-11-072
 
0.3%
1995-12-072
 
0.3%
Other values (661)678
96.9%

Most occurring characters

ValueCountFrequency (%)
01606
22.9%
-1400
20.0%
11125
16.1%
2899
12.8%
9803
11.5%
4213
 
3.0%
3210
 
3.0%
8198
 
2.8%
7189
 
2.7%
6183
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5600
80.0%
Dash Punctuation1400
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
01606
28.7%
11125
20.1%
2899
16.1%
9803
14.3%
4213
 
3.8%
3210
 
3.8%
8198
 
3.5%
7189
 
3.4%
6183
 
3.3%
5174
 
3.1%
ValueCountFrequency (%)
-1400
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common7000
100.0%

Most frequent character per script

ValueCountFrequency (%)
01606
22.9%
-1400
20.0%
11125
16.1%
2899
12.8%
9803
11.5%
4213
 
3.0%
3210
 
3.0%
8198
 
2.8%
7189
 
2.7%
6183
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII7000
100.0%

Most frequent character per block

ValueCountFrequency (%)
01606
22.9%
-1400
20.0%
11125
16.1%
2899
12.8%
9803
11.5%
4213
 
3.0%
3210
 
3.0%
8198
 
2.8%
7189
 
2.7%
6183
 
2.6%

policy_state
Categorical

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
IL
241 
OH
240 
IN
219 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1400
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOH
2nd rowIN
3rd rowIN
4th rowIN
5th rowIL
ValueCountFrequency (%)
IL241
34.4%
OH240
34.3%
IN219
31.3%
2021-04-19T12:04:24.360291image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:24.472579image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
il241
34.4%
oh240
34.3%
in219
31.3%

Most occurring characters

ValueCountFrequency (%)
I460
32.9%
L241
17.2%
O240
17.1%
H240
17.1%
N219
15.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1400
100.0%

Most frequent character per category

ValueCountFrequency (%)
I460
32.9%
L241
17.2%
O240
17.1%
H240
17.1%
N219
15.6%

Most occurring scripts

ValueCountFrequency (%)
Latin1400
100.0%

Most frequent character per script

ValueCountFrequency (%)
I460
32.9%
L241
17.2%
O240
17.1%
H240
17.1%
N219
15.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1400
100.0%

Most frequent character per block

ValueCountFrequency (%)
I460
32.9%
L241
17.2%
O240
17.1%
H240
17.1%
N219
15.6%

policy_csl
Categorical

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
250/500
241 
100/300
237 
500/1000
222 

Length

Max length8
Median length7
Mean length7.317142857
Min length7

Characters and Unicode

Total characters5122
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row250/500
2nd row500/1000
3rd row500/1000
4th row500/1000
5th row100/300
ValueCountFrequency (%)
250/500241
34.4%
100/300237
33.9%
500/1000222
31.7%
2021-04-19T12:04:24.824151image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:24.933503image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
250/500241
34.4%
100/300237
33.9%
500/1000222
31.7%

Most occurring characters

ValueCountFrequency (%)
02781
54.3%
5704
 
13.7%
/700
 
13.7%
1459
 
9.0%
2241
 
4.7%
3237
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4422
86.3%
Other Punctuation700
 
13.7%

Most frequent character per category

ValueCountFrequency (%)
02781
62.9%
5704
 
15.9%
1459
 
10.4%
2241
 
5.5%
3237
 
5.4%
ValueCountFrequency (%)
/700
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common5122
100.0%

Most frequent character per script

ValueCountFrequency (%)
02781
54.3%
5704
 
13.7%
/700
 
13.7%
1459
 
9.0%
2241
 
4.7%
3237
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII5122
100.0%

Most frequent character per block

ValueCountFrequency (%)
02781
54.3%
5704
 
13.7%
/700
 
13.7%
1459
 
9.0%
2241
 
4.7%
3237
 
4.6%
Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
500
241 
1000
239 
2000
220 

Length

Max length4
Median length4
Mean length3.655714286
Min length3

Characters and Unicode

Total characters2559
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1000
2nd row2000
3rd row500
4th row500
5th row2000
ValueCountFrequency (%)
500241
34.4%
1000239
34.1%
2000220
31.4%
2021-04-19T12:04:25.186075image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:25.295018image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
500241
34.4%
1000239
34.1%
2000220
31.4%

Most occurring characters

ValueCountFrequency (%)
01859
72.6%
5241
 
9.4%
1239
 
9.3%
2220
 
8.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2559
100.0%

Most frequent character per category

ValueCountFrequency (%)
01859
72.6%
5241
 
9.4%
1239
 
9.3%
2220
 
8.6%

Most occurring scripts

ValueCountFrequency (%)
Common2559
100.0%

Most frequent character per script

ValueCountFrequency (%)
01859
72.6%
5241
 
9.4%
1239
 
9.3%
2220
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII2559
100.0%

Most frequent character per block

ValueCountFrequency (%)
01859
72.6%
5241
 
9.4%
1239
 
9.3%
2220
 
8.6%

incident_location
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct700
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
2725 Britain Ridge
 
1
6574 4th Drive
 
1
1738 Solo Lane
 
1
3808 5th Ave
 
1
5769 Texas Lane
 
1
Other values (695)
695 

Length

Max length23
Median length14
Mean length14.79
Min length11

Characters and Unicode

Total characters10353
Distinct characters49
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique700 ?
Unique (%)100.0%

Sample

1st row6303 1st Drive
2nd row5585 Washington Drive
3rd row1328 Texas Lane
4th row6117 4th Ave
5th row2272 Embaracadero Drive
ValueCountFrequency (%)
2725 Britain Ridge1
 
0.1%
6574 4th Drive1
 
0.1%
1738 Solo Lane1
 
0.1%
3808 5th Ave1
 
0.1%
5769 Texas Lane1
 
0.1%
5483 Francis Drive1
 
0.1%
9070 Tree Ave1
 
0.1%
3982 Washington Hwy1
 
0.1%
7897 Lincoln St1
 
0.1%
2048 3rd Ridge1
 
0.1%
Other values (690)690
98.6%
2021-04-19T12:04:25.742795image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drive122
 
5.8%
st121
 
5.8%
ave120
 
5.7%
lane118
 
5.6%
ridge117
 
5.6%
hwy102
 
4.9%
4th41
 
2.0%
5th35
 
1.7%
texas34
 
1.6%
mlk33
 
1.6%
Other values (695)1257
59.9%

Most occurring characters

ValueCountFrequency (%)
1400
 
13.5%
e885
 
8.5%
i438
 
4.2%
a430
 
4.2%
n357
 
3.4%
r348
 
3.4%
5331
 
3.2%
t328
 
3.2%
4314
 
3.0%
3310
 
3.0%
Other values (39)5212
50.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4687
45.3%
Decimal Number2931
28.3%
Space Separator1400
 
13.5%
Uppercase Letter1335
 
12.9%

Most frequent character per category

ValueCountFrequency (%)
e885
18.9%
i438
9.3%
a430
9.2%
n357
 
7.6%
r348
 
7.4%
t328
 
7.0%
v274
 
5.8%
d224
 
4.8%
o186
 
4.0%
h149
 
3.2%
Other values (12)1068
22.8%
ValueCountFrequency (%)
L178
13.3%
A167
12.5%
S164
12.3%
R145
10.9%
D122
9.1%
H102
7.6%
T61
 
4.6%
M60
 
4.5%
F59
 
4.4%
W58
 
4.3%
Other values (6)219
16.4%
ValueCountFrequency (%)
5331
11.3%
4314
10.7%
3310
10.6%
1305
10.4%
2302
10.3%
8300
10.2%
7296
10.1%
9296
10.1%
6272
9.3%
0205
7.0%
ValueCountFrequency (%)
1400
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6022
58.2%
Common4331
41.8%

Most frequent character per script

ValueCountFrequency (%)
e885
 
14.7%
i438
 
7.3%
a430
 
7.1%
n357
 
5.9%
r348
 
5.8%
t328
 
5.4%
v274
 
4.5%
d224
 
3.7%
o186
 
3.1%
L178
 
3.0%
Other values (28)2374
39.4%
ValueCountFrequency (%)
1400
32.3%
5331
 
7.6%
4314
 
7.3%
3310
 
7.2%
1305
 
7.0%
2302
 
7.0%
8300
 
6.9%
7296
 
6.8%
9296
 
6.8%
6272
 
6.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII10353
100.0%

Most frequent character per block

ValueCountFrequency (%)
1400
 
13.5%
e885
 
8.5%
i438
 
4.2%
a430
 
4.2%
n357
 
3.4%
r348
 
3.4%
5331
 
3.2%
t328
 
3.2%
4314
 
3.0%
3310
 
3.0%
Other values (39)5212
50.3%

incident_hour_of_the_day
Real number (ℝ≥0)

ZEROS

Distinct24
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.74714286
Minimum0
Maximum23
Zeros40
Zeros (%)5.7%
Memory size5.6 KiB
2021-04-19T12:04:25.920535image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q16
median12
Q317.25
95-th percentile23
Maximum23
Range23
Interquartile range (IQR)11.25

Descriptive statistics

Standard deviation6.987444727
Coefficient of variation (CV)0.5948207843
Kurtosis-1.186013207
Mean11.74714286
Median Absolute Deviation (MAD)6
Skewness-0.07359909252
Sum8223
Variance48.82438381
MonotocityNot monotonic
2021-04-19T12:04:26.107812image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1741
 
5.9%
040
 
5.7%
2336
 
5.1%
2135
 
5.0%
334
 
4.9%
1633
 
4.7%
1233
 
4.7%
1332
 
4.6%
929
 
4.1%
1029
 
4.1%
Other values (14)358
51.1%
ValueCountFrequency (%)
040
5.7%
121
3.0%
221
3.0%
334
4.9%
428
4.0%
ValueCountFrequency (%)
2336
5.1%
2224
3.4%
2135
5.0%
2027
3.9%
1927
3.9%
Distinct4
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
1
412 
3
253 
4
 
19
2
 
16

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters700
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row3
4th row1
5th row3
ValueCountFrequency (%)
1412
58.9%
3253
36.1%
419
 
2.7%
216
 
2.3%
2021-04-19T12:04:26.527879image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:26.655930image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
1412
58.9%
3253
36.1%
419
 
2.7%
216
 
2.3%

Most occurring characters

ValueCountFrequency (%)
1412
58.9%
3253
36.1%
419
 
2.7%
216
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number700
100.0%

Most frequent character per category

ValueCountFrequency (%)
1412
58.9%
3253
36.1%
419
 
2.7%
216
 
2.3%

Most occurring scripts

ValueCountFrequency (%)
Common700
100.0%

Most frequent character per script

ValueCountFrequency (%)
1412
58.9%
3253
36.1%
419
 
2.7%
216
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII700
100.0%

Most frequent character per block

ValueCountFrequency (%)
1412
58.9%
3253
36.1%
419
 
2.7%
216
 
2.3%

property_damage
Categorical

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
?
255 
NO
224 
YES
221 

Length

Max length3
Median length2
Mean length1.951428571
Min length1

Characters and Unicode

Total characters1366
Distinct characters6
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row?
2nd rowNO
3rd rowNO
4th row?
5th rowYES
ValueCountFrequency (%)
?255
36.4%
NO224
32.0%
YES221
31.6%
2021-04-19T12:04:27.044631image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:27.190076image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
255
36.4%
no224
32.0%
yes221
31.6%

Most occurring characters

ValueCountFrequency (%)
?255
18.7%
N224
16.4%
O224
16.4%
Y221
16.2%
E221
16.2%
S221
16.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1111
81.3%
Other Punctuation255
 
18.7%

Most frequent character per category

ValueCountFrequency (%)
N224
20.2%
O224
20.2%
Y221
19.9%
E221
19.9%
S221
19.9%
ValueCountFrequency (%)
?255
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1111
81.3%
Common255
 
18.7%

Most frequent character per script

ValueCountFrequency (%)
N224
20.2%
O224
20.2%
Y221
19.9%
E221
19.9%
S221
19.9%
ValueCountFrequency (%)
?255
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1366
100.0%

Most frequent character per block

ValueCountFrequency (%)
?255
18.7%
N224
16.4%
O224
16.4%
Y221
16.2%
E221
16.2%
S221
16.2%

bodily_injuries
Categorical

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
2
235 
0
234 
1
231 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters700
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row2
ValueCountFrequency (%)
2235
33.6%
0234
33.4%
1231
33.0%
2021-04-19T12:04:27.591447image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:27.748515image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
2235
33.6%
0234
33.4%
1231
33.0%

Most occurring characters

ValueCountFrequency (%)
2235
33.6%
0234
33.4%
1231
33.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number700
100.0%

Most frequent character per category

ValueCountFrequency (%)
2235
33.6%
0234
33.4%
1231
33.0%

Most occurring scripts

ValueCountFrequency (%)
Common700
100.0%

Most frequent character per script

ValueCountFrequency (%)
2235
33.6%
0234
33.4%
1231
33.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII700
100.0%

Most frequent character per block

ValueCountFrequency (%)
2235
33.6%
0234
33.4%
1231
33.0%

policy_annual_premium
Real number (ℝ≥0)

Distinct694
Distinct (%)99.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1256.950357
Minimum433.33
Maximum2047.59
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-04-19T12:04:27.959533image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum433.33
5-th percentile840.943
Q11084.7025
median1256.34
Q31423.89
95-th percentile1653.4435
Maximum2047.59
Range1614.26
Interquartile range (IQR)339.1875

Descriptive statistics

Standard deviation249.6168023
Coefficient of variation (CV)0.198589229
Kurtosis0.1096787666
Mean1256.950357
Median Absolute Deviation (MAD)169.995
Skewness-0.05557273268
Sum879865.25
Variance62308.54801
MonotocityNot monotonic
2021-04-19T12:04:28.266297image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1281.252
 
0.3%
1374.222
 
0.3%
1389.132
 
0.3%
1215.362
 
0.3%
1073.832
 
0.3%
1524.452
 
0.3%
1124.431
 
0.1%
1260.561
 
0.1%
1356.641
 
0.1%
1151.391
 
0.1%
Other values (684)684
97.7%
ValueCountFrequency (%)
433.331
0.1%
484.671
0.1%
538.171
0.1%
566.111
0.1%
617.111
0.1%
ValueCountFrequency (%)
2047.591
0.1%
1969.631
0.1%
1922.841
0.1%
1896.911
0.1%
1865.831
0.1%

umbrella_limit
Real number (ℝ≥0)

ZEROS

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1092857.143
Minimum0
Maximum10000000
Zeros561
Zeros (%)80.1%
Memory size5.6 KiB
2021-04-19T12:04:28.531591image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile6000000
Maximum10000000
Range10000000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2289793.328
Coefficient of variation (CV)2.095235725
Kurtosis1.747595947
Mean1092857.143
Median Absolute Deviation (MAD)0
Skewness1.807292601
Sum765000000
Variance5.243153485 × 1012
MonotocityNot monotonic
2021-04-19T12:04:28.818031image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
0561
80.1%
600000037
 
5.3%
500000034
 
4.9%
400000025
 
3.6%
700000024
 
3.4%
30000008
 
1.1%
80000005
 
0.7%
90000003
 
0.4%
20000002
 
0.3%
100000001
 
0.1%
ValueCountFrequency (%)
0561
80.1%
20000002
 
0.3%
30000008
 
1.1%
400000025
 
3.6%
500000034
 
4.9%
ValueCountFrequency (%)
100000001
 
0.1%
90000003
 
0.4%
80000005
 
0.7%
700000024
3.4%
600000037
5.3%

insured_zip
Real number (ℝ≥0)

Distinct697
Distinct (%)99.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean500211.26
Minimum430104
Maximum620869
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-04-19T12:04:29.200312image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum430104
5-th percentile433587
Q1446952
median465565
Q3603417.5
95-th percentile617740.75
Maximum620869
Range190765
Interquartile range (IQR)156465.5

Descriptive statistics

Standard deviation71731.67763
Coefficient of variation (CV)0.1434027647
Kurtosis-1.156487335
Mean500211.26
Median Absolute Deviation (MAD)21260
Skewness0.8383224898
Sum350147882
Variance5145433575
MonotocityNot monotonic
2021-04-19T12:04:29.532894image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4776952
 
0.3%
4566022
 
0.3%
4312022
 
0.3%
4532771
 
0.1%
4747711
 
0.1%
4727241
 
0.1%
4717041
 
0.1%
4522491
 
0.1%
4532741
 
0.1%
6159211
 
0.1%
Other values (687)687
98.1%
ValueCountFrequency (%)
4301041
0.1%
4301411
0.1%
4302321
0.1%
4303801
0.1%
4305671
0.1%
ValueCountFrequency (%)
6208691
0.1%
6208191
0.1%
6207571
0.1%
6207371
0.1%
6205071
0.1%

incident_date
Categorical

HIGH CARDINALITY

Distinct60
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
2015-02-17
 
21
2015-02-02
 
19
2015-01-07
 
17
2015-02-04
 
17
2015-01-08
 
16
Other values (55)
610 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters7000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2015-02-18
2nd row2015-02-19
3rd row2015-01-31
4th row2015-02-05
5th row2015-01-03
ValueCountFrequency (%)
2015-02-1721
 
3.0%
2015-02-0219
 
2.7%
2015-01-0717
 
2.4%
2015-02-0417
 
2.4%
2015-01-0816
 
2.3%
2015-01-1916
 
2.3%
2015-02-2215
 
2.1%
2015-01-2415
 
2.1%
2015-01-0315
 
2.1%
2015-01-2115
 
2.1%
Other values (50)534
76.3%
2021-04-19T12:04:30.324009image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2015-02-1721
 
3.0%
2015-02-0219
 
2.7%
2015-01-0717
 
2.4%
2015-02-0417
 
2.4%
2015-01-0816
 
2.3%
2015-01-1916
 
2.3%
2015-02-2215
 
2.1%
2015-01-2415
 
2.1%
2015-01-0315
 
2.1%
2015-01-2115
 
2.1%
Other values (50)534
76.3%

Most occurring characters

ValueCountFrequency (%)
01684
24.1%
-1400
20.0%
11388
19.8%
21319
18.8%
5759
10.8%
3112
 
1.6%
778
 
1.1%
476
 
1.1%
872
 
1.0%
663
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5600
80.0%
Dash Punctuation1400
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
01684
30.1%
11388
24.8%
21319
23.6%
5759
13.6%
3112
 
2.0%
778
 
1.4%
476
 
1.4%
872
 
1.3%
663
 
1.1%
949
 
0.9%
ValueCountFrequency (%)
-1400
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common7000
100.0%

Most frequent character per script

ValueCountFrequency (%)
01684
24.1%
-1400
20.0%
11388
19.8%
21319
18.8%
5759
10.8%
3112
 
1.6%
778
 
1.1%
476
 
1.1%
872
 
1.0%
663
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII7000
100.0%

Most frequent character per block

ValueCountFrequency (%)
01684
24.1%
-1400
20.0%
11388
19.8%
21319
18.8%
5759
10.8%
3112
 
1.6%
778
 
1.1%
476
 
1.1%
872
 
1.0%
663
 
0.9%

incident_type
Categorical

Distinct4
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
Single Vehicle Collision
295 
Multi-vehicle Collision
288 
Vehicle Theft
60 
Parked Car
57 

Length

Max length24
Median length23
Mean length21.50571429
Min length10

Characters and Unicode

Total characters15054
Distinct characters25
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowParked Car
2nd rowSingle Vehicle Collision
3rd rowMulti-vehicle Collision
4th rowSingle Vehicle Collision
5th rowMulti-vehicle Collision
ValueCountFrequency (%)
Single Vehicle Collision295
42.1%
Multi-vehicle Collision288
41.1%
Vehicle Theft60
 
8.6%
Parked Car57
 
8.1%
2021-04-19T12:04:31.686312image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:31.908178image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
collision583
34.4%
vehicle355
20.9%
single295
17.4%
multi-vehicle288
17.0%
theft60
 
3.5%
parked57
 
3.4%
car57
 
3.4%

Most occurring characters

ValueCountFrequency (%)
i2392
15.9%
l2392
15.9%
e1698
11.3%
o1166
 
7.7%
995
 
6.6%
n878
 
5.8%
h703
 
4.7%
c643
 
4.3%
C640
 
4.3%
s583
 
3.9%
Other values (15)2964
19.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter12076
80.2%
Uppercase Letter1695
 
11.3%
Space Separator995
 
6.6%
Dash Punctuation288
 
1.9%

Most frequent character per category

ValueCountFrequency (%)
i2392
19.8%
l2392
19.8%
e1698
14.1%
o1166
9.7%
n878
 
7.3%
h703
 
5.8%
c643
 
5.3%
s583
 
4.8%
t348
 
2.9%
g295
 
2.4%
Other values (7)978
8.1%
ValueCountFrequency (%)
C640
37.8%
V355
20.9%
S295
17.4%
M288
17.0%
T60
 
3.5%
P57
 
3.4%
ValueCountFrequency (%)
995
100.0%
ValueCountFrequency (%)
-288
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin13771
91.5%
Common1283
 
8.5%

Most frequent character per script

ValueCountFrequency (%)
i2392
17.4%
l2392
17.4%
e1698
12.3%
o1166
8.5%
n878
 
6.4%
h703
 
5.1%
c643
 
4.7%
C640
 
4.6%
s583
 
4.2%
V355
 
2.6%
Other values (13)2321
16.9%
ValueCountFrequency (%)
995
77.6%
-288
 
22.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII15054
100.0%

Most frequent character per block

ValueCountFrequency (%)
i2392
15.9%
l2392
15.9%
e1698
11.3%
o1166
 
7.7%
995
 
6.6%
n878
 
5.8%
h703
 
4.7%
c643
 
4.3%
C640
 
4.3%
s583
 
3.9%
Other values (15)2964
19.7%

collision_type
Categorical

Distinct4
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
Rear Collision
204 
Side Collision
197 
Front Collision
182 
?
117 

Length

Max length15
Median length14
Mean length12.08714286
Min length1

Characters and Unicode

Total characters8461
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row?
2nd rowRear Collision
3rd rowFront Collision
4th rowFront Collision
5th rowFront Collision
ValueCountFrequency (%)
Rear Collision204
29.1%
Side Collision197
28.1%
Front Collision182
26.0%
?117
16.7%
2021-04-19T12:04:32.340374image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:32.581664image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
collision583
45.4%
rear204
 
15.9%
side197
 
15.4%
front182
 
14.2%
117
 
9.1%

Most occurring characters

ValueCountFrequency (%)
i1363
16.1%
o1348
15.9%
l1166
13.8%
n765
9.0%
583
6.9%
C583
6.9%
s583
6.9%
e401
 
4.7%
r386
 
4.6%
R204
 
2.4%
Other values (6)1079
12.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6595
77.9%
Uppercase Letter1166
 
13.8%
Space Separator583
 
6.9%
Other Punctuation117
 
1.4%

Most frequent character per category

ValueCountFrequency (%)
i1363
20.7%
o1348
20.4%
l1166
17.7%
n765
11.6%
s583
8.8%
e401
 
6.1%
r386
 
5.9%
a204
 
3.1%
d197
 
3.0%
t182
 
2.8%
ValueCountFrequency (%)
C583
50.0%
R204
 
17.5%
S197
 
16.9%
F182
 
15.6%
ValueCountFrequency (%)
?117
100.0%
ValueCountFrequency (%)
583
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7761
91.7%
Common700
 
8.3%

Most frequent character per script

ValueCountFrequency (%)
i1363
17.6%
o1348
17.4%
l1166
15.0%
n765
9.9%
C583
7.5%
s583
7.5%
e401
 
5.2%
r386
 
5.0%
R204
 
2.6%
a204
 
2.6%
Other values (4)758
9.8%
ValueCountFrequency (%)
583
83.3%
?117
 
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII8461
100.0%

Most frequent character per block

ValueCountFrequency (%)
i1363
16.1%
o1348
15.9%
l1166
13.8%
n765
9.0%
583
6.9%
C583
6.9%
s583
6.9%
e401
 
4.7%
r386
 
4.6%
R204
 
2.4%
Other values (6)1079
12.8%
Distinct4
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
Minor Damage
248 
Total Loss
209 
Major Damage
189 
Trivial Damage
54 

Length

Max length14
Median length12
Mean length11.55714286
Min length10

Characters and Unicode

Total characters8090
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTrivial Damage
2nd rowTotal Loss
3rd rowMajor Damage
4th rowMajor Damage
5th rowTotal Loss
ValueCountFrequency (%)
Minor Damage248
35.4%
Total Loss209
29.9%
Major Damage189
27.0%
Trivial Damage54
 
7.7%
2021-04-19T12:04:33.068267image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:33.255962image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
damage491
35.1%
minor248
17.7%
loss209
14.9%
total209
14.9%
major189
 
13.5%
trivial54
 
3.9%

Most occurring characters

ValueCountFrequency (%)
a1434
17.7%
o855
10.6%
700
 
8.7%
r491
 
6.1%
D491
 
6.1%
m491
 
6.1%
g491
 
6.1%
e491
 
6.1%
M437
 
5.4%
s418
 
5.2%
Other values (8)1791
22.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5990
74.0%
Uppercase Letter1400
 
17.3%
Space Separator700
 
8.7%

Most frequent character per category

ValueCountFrequency (%)
a1434
23.9%
o855
14.3%
r491
 
8.2%
m491
 
8.2%
g491
 
8.2%
e491
 
8.2%
s418
 
7.0%
i356
 
5.9%
l263
 
4.4%
n248
 
4.1%
Other values (3)452
 
7.5%
ValueCountFrequency (%)
D491
35.1%
M437
31.2%
T263
18.8%
L209
14.9%
ValueCountFrequency (%)
700
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7390
91.3%
Common700
 
8.7%

Most frequent character per script

ValueCountFrequency (%)
a1434
19.4%
o855
11.6%
r491
 
6.6%
D491
 
6.6%
m491
 
6.6%
g491
 
6.6%
e491
 
6.6%
M437
 
5.9%
s418
 
5.7%
i356
 
4.8%
Other values (7)1435
19.4%
ValueCountFrequency (%)
700
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8090
100.0%

Most frequent character per block

ValueCountFrequency (%)
a1434
17.7%
o855
10.6%
700
 
8.7%
r491
 
6.1%
D491
 
6.1%
m491
 
6.1%
g491
 
6.1%
e491
 
6.1%
M437
 
5.4%
s418
 
5.2%
Other values (8)1791
22.1%
Distinct5
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
Police
203 
Fire
164 
Ambulance
138 
Other
136 
None
59 

Length

Max length9
Median length5
Mean length5.76
Min length4

Characters and Unicode

Total characters4032
Distinct characters18
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPolice
2nd rowFire
3rd rowOther
4th rowOther
5th rowPolice
ValueCountFrequency (%)
Police203
29.0%
Fire164
23.4%
Ambulance138
19.7%
Other136
19.4%
None59
 
8.4%
2021-04-19T12:04:33.746504image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:33.923339image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
police203
29.0%
fire164
23.4%
ambulance138
19.7%
other136
19.4%
none59
 
8.4%

Most occurring characters

ValueCountFrequency (%)
e700
17.4%
i367
 
9.1%
l341
 
8.5%
c341
 
8.5%
r300
 
7.4%
o262
 
6.5%
P203
 
5.0%
n197
 
4.9%
F164
 
4.1%
A138
 
3.4%
Other values (8)1019
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3332
82.6%
Uppercase Letter700
 
17.4%

Most frequent character per category

ValueCountFrequency (%)
e700
21.0%
i367
11.0%
l341
10.2%
c341
10.2%
r300
9.0%
o262
 
7.9%
n197
 
5.9%
m138
 
4.1%
b138
 
4.1%
u138
 
4.1%
Other values (3)410
12.3%
ValueCountFrequency (%)
P203
29.0%
F164
23.4%
A138
19.7%
O136
19.4%
N59
 
8.4%

Most occurring scripts

ValueCountFrequency (%)
Latin4032
100.0%

Most frequent character per script

ValueCountFrequency (%)
e700
17.4%
i367
 
9.1%
l341
 
8.5%
c341
 
8.5%
r300
 
7.4%
o262
 
6.5%
P203
 
5.0%
n197
 
4.9%
F164
 
4.1%
A138
 
3.4%
Other values (8)1019
25.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII4032
100.0%

Most frequent character per block

ValueCountFrequency (%)
e700
17.4%
i367
 
9.1%
l341
 
8.5%
c341
 
8.5%
r300
 
7.4%
o262
 
6.5%
P203
 
5.0%
n197
 
4.9%
F164
 
4.1%
A138
 
3.4%
Other values (8)1019
25.3%

incident_state
Categorical

Distinct7
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
NY
197 
SC
166 
WV
149 
VA
80 
NC
74 
Other values (2)
34 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1400
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNC
2nd rowNY
3rd rowWV
4th rowWV
5th rowWV
ValueCountFrequency (%)
NY197
28.1%
SC166
23.7%
WV149
21.3%
VA80
11.4%
NC74
 
10.6%
OH17
 
2.4%
PA17
 
2.4%
2021-04-19T12:04:34.367269image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:34.504633image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
ny197
28.1%
sc166
23.7%
wv149
21.3%
va80
11.4%
nc74
 
10.6%
pa17
 
2.4%
oh17
 
2.4%

Most occurring characters

ValueCountFrequency (%)
N271
19.4%
C240
17.1%
V229
16.4%
Y197
14.1%
S166
11.9%
W149
10.6%
A97
 
6.9%
O17
 
1.2%
H17
 
1.2%
P17
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1400
100.0%

Most frequent character per category

ValueCountFrequency (%)
N271
19.4%
C240
17.1%
V229
16.4%
Y197
14.1%
S166
11.9%
W149
10.6%
A97
 
6.9%
O17
 
1.2%
H17
 
1.2%
P17
 
1.2%

Most occurring scripts

ValueCountFrequency (%)
Latin1400
100.0%

Most frequent character per script

ValueCountFrequency (%)
N271
19.4%
C240
17.1%
V229
16.4%
Y197
14.1%
S166
11.9%
W149
10.6%
A97
 
6.9%
O17
 
1.2%
H17
 
1.2%
P17
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1400
100.0%

Most frequent character per block

ValueCountFrequency (%)
N271
19.4%
C240
17.1%
V229
16.4%
Y197
14.1%
S166
11.9%
W149
10.6%
A97
 
6.9%
O17
 
1.2%
H17
 
1.2%
P17
 
1.2%

incident_city
Categorical

Distinct7
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
Springfield
118 
Northbend
106 
Columbus
102 
Hillsdale
101 
Arlington
95 
Other values (2)
178 

Length

Max length11
Median length9
Mean length9.314285714
Min length8

Characters and Unicode

Total characters6520
Distinct characters26
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowArlington
2nd rowColumbus
3rd rowRiverwood
4th rowColumbus
5th rowSpringfield
ValueCountFrequency (%)
Springfield118
16.9%
Northbend106
15.1%
Columbus102
14.6%
Hillsdale101
14.4%
Arlington95
13.6%
Riverwood92
13.1%
Northbrook86
12.3%
2021-04-19T12:04:34.858077image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:35.041864image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
springfield118
16.9%
northbend106
15.1%
columbus102
14.6%
hillsdale101
14.4%
arlington95
13.6%
riverwood92
13.1%
northbrook86
12.3%

Most occurring characters

ValueCountFrequency (%)
o745
 
11.4%
l618
 
9.5%
r583
 
8.9%
i524
 
8.0%
e417
 
6.4%
d417
 
6.4%
n414
 
6.3%
b294
 
4.5%
t287
 
4.4%
g213
 
3.3%
Other values (16)2008
30.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5820
89.3%
Uppercase Letter700
 
10.7%

Most frequent character per category

ValueCountFrequency (%)
o745
12.8%
l618
10.6%
r583
10.0%
i524
9.0%
e417
 
7.2%
d417
 
7.2%
n414
 
7.1%
b294
 
5.1%
t287
 
4.9%
g213
 
3.7%
Other values (10)1308
22.5%
ValueCountFrequency (%)
N192
27.4%
S118
16.9%
C102
14.6%
H101
14.4%
A95
13.6%
R92
13.1%

Most occurring scripts

ValueCountFrequency (%)
Latin6520
100.0%

Most frequent character per script

ValueCountFrequency (%)
o745
 
11.4%
l618
 
9.5%
r583
 
8.9%
i524
 
8.0%
e417
 
6.4%
d417
 
6.4%
n414
 
6.3%
b294
 
4.5%
t287
 
4.4%
g213
 
3.3%
Other values (16)2008
30.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII6520
100.0%

Most frequent character per block

ValueCountFrequency (%)
o745
 
11.4%
l618
 
9.5%
r583
 
8.9%
i524
 
8.0%
e417
 
6.4%
d417
 
6.4%
n414
 
6.3%
b294
 
4.5%
t287
 
4.4%
g213
 
3.3%
Other values (16)2008
30.8%

witnesses
Categorical

Distinct4
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
0
191 
1
179 
3
171 
2
159 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters700
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row3
4th row0
5th row2
ValueCountFrequency (%)
0191
27.3%
1179
25.6%
3171
24.4%
2159
22.7%
2021-04-19T12:04:35.574664image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:35.718384image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
0191
27.3%
1179
25.6%
3171
24.4%
2159
22.7%

Most occurring characters

ValueCountFrequency (%)
0191
27.3%
1179
25.6%
3171
24.4%
2159
22.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number700
100.0%

Most frequent character per category

ValueCountFrequency (%)
0191
27.3%
1179
25.6%
3171
24.4%
2159
22.7%

Most occurring scripts

ValueCountFrequency (%)
Common700
100.0%

Most frequent character per script

ValueCountFrequency (%)
0191
27.3%
1179
25.6%
3171
24.4%
2159
22.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII700
100.0%

Most frequent character per block

ValueCountFrequency (%)
0191
27.3%
1179
25.6%
3171
24.4%
2159
22.7%
Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
NO
250 
YES
226 
?
224 

Length

Max length3
Median length2
Mean length2.002857143
Min length1

Characters and Unicode

Total characters1402
Distinct characters6
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYES
2nd rowNO
3rd row?
4th row?
5th rowYES
ValueCountFrequency (%)
NO250
35.7%
YES226
32.3%
?224
32.0%
2021-04-19T12:04:36.098356image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-04-19T12:04:36.264563image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
no250
35.7%
yes226
32.3%
224
32.0%

Most occurring characters

ValueCountFrequency (%)
N250
17.8%
O250
17.8%
Y226
16.1%
E226
16.1%
S226
16.1%
?224
16.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1178
84.0%
Other Punctuation224
 
16.0%

Most frequent character per category

ValueCountFrequency (%)
N250
21.2%
O250
21.2%
Y226
19.2%
E226
19.2%
S226
19.2%
ValueCountFrequency (%)
?224
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1178
84.0%
Common224
 
16.0%

Most frequent character per script

ValueCountFrequency (%)
N250
21.2%
O250
21.2%
Y226
19.2%
E226
19.2%
S226
19.2%
ValueCountFrequency (%)
?224
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1402
100.0%

Most frequent character per block

ValueCountFrequency (%)
N250
17.8%
O250
17.8%
Y226
16.1%
E226
16.1%
S226
16.1%
?224
16.0%

auto_make
Categorical

HIGH CORRELATION

Distinct14
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
Dodge
55 
Saab
55 
BMW
54 
Volkswagen
54 
Nissan
53 
Other values (9)
429 

Length

Max length10
Median length6
Mean length5.731428571
Min length3

Characters and Unicode

Total characters4012
Distinct characters33
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMercedes
2nd rowDodge
3rd rowVolkswagen
4th rowToyota
5th rowVolkswagen
ValueCountFrequency (%)
Dodge55
 
7.9%
Saab55
 
7.9%
BMW54
 
7.7%
Volkswagen54
 
7.7%
Nissan53
 
7.6%
Accura53
 
7.6%
Jeep53
 
7.6%
Chevrolet51
 
7.3%
Suburu50
 
7.1%
Mercedes48
 
6.9%
Other values (4)174
24.9%
2021-04-19T12:04:36.746458image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
saab55
 
7.9%
dodge55
 
7.9%
volkswagen54
 
7.7%
bmw54
 
7.7%
nissan53
 
7.6%
jeep53
 
7.6%
accura53
 
7.6%
chevrolet51
 
7.3%
suburu50
 
7.1%
mercedes48
 
6.9%
Other values (4)174
24.9%

Most occurring characters

ValueCountFrequency (%)
e461
 
11.5%
a353
 
8.8%
o335
 
8.3%
r249
 
6.2%
u247
 
6.2%
d232
 
5.8%
s208
 
5.2%
c154
 
3.8%
n145
 
3.6%
g109
 
2.7%
Other values (23)1519
37.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3204
79.9%
Uppercase Letter808
 
20.1%

Most frequent character per category

ValueCountFrequency (%)
e461
14.4%
a353
11.0%
o335
10.5%
r249
 
7.8%
u247
 
7.7%
d232
 
7.2%
s208
 
6.5%
c154
 
4.8%
n145
 
4.5%
g109
 
3.4%
Other values (10)711
22.2%
ValueCountFrequency (%)
S105
13.0%
M102
12.6%
A97
12.0%
D55
6.8%
V54
6.7%
B54
6.7%
W54
6.7%
J53
6.6%
N53
6.6%
C51
 
6.3%
Other values (3)130
16.1%

Most occurring scripts

ValueCountFrequency (%)
Latin4012
100.0%

Most frequent character per script

ValueCountFrequency (%)
e461
 
11.5%
a353
 
8.8%
o335
 
8.3%
r249
 
6.2%
u247
 
6.2%
d232
 
5.8%
s208
 
5.2%
c154
 
3.8%
n145
 
3.6%
g109
 
2.7%
Other values (23)1519
37.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII4012
100.0%

Most frequent character per block

ValueCountFrequency (%)
e461
 
11.5%
a353
 
8.8%
o335
 
8.3%
r249
 
6.2%
u247
 
6.2%
d232
 
5.8%
s208
 
5.2%
c154
 
3.8%
n145
 
3.6%
g109
 
2.7%
Other values (23)1519
37.9%

auto_model
Categorical

HIGH CORRELATION

Distinct39
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
Wrangler
 
33
MDX
 
32
Jetta
 
30
RAM
 
29
Neon
 
26
Other values (34)
550 

Length

Max length14
Median length5
Mean length5.185714286
Min length2

Characters and Unicode

Total characters3630
Distinct characters52
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowE400
2nd rowNeon
3rd rowPassat
4th rowCorolla
5th rowJetta
ValueCountFrequency (%)
Wrangler33
 
4.7%
MDX32
 
4.6%
Jetta30
 
4.3%
RAM29
 
4.1%
Neon26
 
3.7%
A325
 
3.6%
Passat24
 
3.4%
E40022
 
3.1%
Camry21
 
3.0%
Pathfinder21
 
3.0%
Other values (29)437
62.4%
2021-04-19T12:04:37.216604image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
wrangler33
 
4.5%
mdx32
 
4.4%
jetta30
 
4.1%
ram29
 
4.0%
neon26
 
3.5%
a325
 
3.4%
passat24
 
3.3%
e40022
 
3.0%
pathfinder21
 
2.9%
camry21
 
2.9%
Other values (31)470
64.1%

Most occurring characters

ValueCountFrequency (%)
a343
 
9.4%
e306
 
8.4%
r280
 
7.7%
o162
 
4.5%
i160
 
4.4%
t139
 
3.8%
n129
 
3.6%
M125
 
3.4%
l119
 
3.3%
s112
 
3.1%
Other values (42)1755
48.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2327
64.1%
Uppercase Letter870
 
24.0%
Decimal Number400
 
11.0%
Space Separator33
 
0.9%

Most frequent character per category

ValueCountFrequency (%)
a343
14.7%
e306
13.1%
r280
12.0%
o162
 
7.0%
i160
 
6.9%
t139
 
6.0%
n129
 
5.5%
l119
 
5.1%
s112
 
4.8%
d78
 
3.4%
Other values (13)499
21.4%
ValueCountFrequency (%)
M125
14.4%
C95
10.9%
A79
 
9.1%
X72
 
8.3%
R58
 
6.7%
F52
 
6.0%
P45
 
5.2%
L44
 
5.1%
S42
 
4.8%
E37
 
4.3%
Other values (10)221
25.4%
ValueCountFrequency (%)
598
24.5%
097
24.2%
381
20.2%
955
13.8%
422
 
5.5%
220
 
5.0%
116
 
4.0%
611
 
2.8%
ValueCountFrequency (%)
33
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3197
88.1%
Common433
 
11.9%

Most frequent character per script

ValueCountFrequency (%)
a343
 
10.7%
e306
 
9.6%
r280
 
8.8%
o162
 
5.1%
i160
 
5.0%
t139
 
4.3%
n129
 
4.0%
M125
 
3.9%
l119
 
3.7%
s112
 
3.5%
Other values (33)1322
41.4%
ValueCountFrequency (%)
598
22.6%
097
22.4%
381
18.7%
955
12.7%
33
 
7.6%
422
 
5.1%
220
 
4.6%
116
 
3.7%
611
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII3630
100.0%

Most frequent character per block

ValueCountFrequency (%)
a343
 
9.4%
e306
 
8.4%
r280
 
7.7%
o162
 
4.5%
i160
 
4.4%
t139
 
3.8%
n129
 
3.6%
M125
 
3.4%
l119
 
3.3%
s112
 
3.1%
Other values (42)1755
48.3%

auto_year
Real number (ℝ≥0)

Distinct21
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2004.984286
Minimum1995
Maximum2015
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-04-19T12:04:37.440150image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum1995
5-th percentile1995
Q12000
median2005
Q32010
95-th percentile2014
Maximum2015
Range20
Interquartile range (IQR)10

Descriptive statistics

Standard deviation6.013198067
Coefficient of variation (CV)0.002999124786
Kurtosis-1.179731951
Mean2004.984286
Median Absolute Deviation (MAD)5
Skewness-0.07404549759
Sum1403489
Variance36.15855099
MonotocityNot monotonic
2021-04-19T12:04:37.639449image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
199547
 
6.7%
201143
 
6.1%
200741
 
5.9%
200238
 
5.4%
200938
 
5.4%
200537
 
5.3%
199937
 
5.3%
200835
 
5.0%
199735
 
5.0%
201235
 
5.0%
Other values (11)314
44.9%
ValueCountFrequency (%)
199547
6.7%
199624
3.4%
199735
5.0%
199822
3.1%
199937
5.3%
ValueCountFrequency (%)
201528
4.0%
201427
3.9%
201331
4.4%
201235
5.0%
201143
6.1%

_c39
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing700
Missing (%)100.0%
Memory size5.6 KiB

total_claim_amount
Real number (ℝ≥0)

Distinct572
Distinct (%)81.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean71900.93321
Minimum133.33
Maximum153226.67
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-04-19T12:04:37.874127image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum133.33
5-th percentile6265.3365
Q158933.33
median77733.33
Q395503.3325
95-th percentile118230.6635
Maximum153226.67
Range153093.34
Interquartile range (IQR)36570.0025

Descriptive statistics

Standard deviation34915.97492
Coefficient of variation (CV)0.4856122635
Kurtosis-0.330594352
Mean71900.93321
Median Absolute Deviation (MAD)18506.665
Skewness-0.6208198363
Sum50330653.25
Variance1219125305
MonotocityNot monotonic
2021-04-19T12:04:38.104730image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
93866.674
 
0.6%
100533.334
 
0.6%
73333.333
 
0.4%
8533.333
 
0.4%
1064003
 
0.4%
792003
 
0.4%
61603
 
0.4%
808003
 
0.4%
736003
 
0.4%
58933.333
 
0.4%
Other values (562)668
95.4%
ValueCountFrequency (%)
133.331
0.1%
25601
0.1%
28801
0.1%
32001
0.1%
35202
0.3%
ValueCountFrequency (%)
153226.671
0.1%
1497601
0.1%
1446401
0.1%
1440401
0.1%
143866.671
0.1%

Interactions

2021-04-19T12:03:53.403278image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:53.608912image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:53.751876image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:53.895993image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:54.031437image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:54.163002image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:54.305263image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:54.437281image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:54.583836image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:54.732205image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:54.909126image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:55.069066image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:55.220112image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:55.348661image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:55.475435image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:55.601442image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:55.731777image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:56.367891image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:56.522115image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:56.655819image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:56.789025image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:56.924569image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:57.055602image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:57.185379image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:57.310975image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:57.436111image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:57.577649image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:57.706280image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:57.861782image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:58.015009image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:58.161854image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:58.297697image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:58.430489image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:58.564953image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:58.695284image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:58.829908image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:58.979805image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:59.117417image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:59.263861image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:59.415245image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:59.559852image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:59.691309image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:59.821277image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:03:59.982343image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:00.125185image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:00.263185image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:00.405998image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:00.569778image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:00.749518image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:00.917644image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:01.065946image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:01.200962image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:01.329969image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:01.463029image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:01.655589image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:01.840756image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:02.026251image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:02.191196image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:02.337603image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:02.505359image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:02.652877image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:02.804139image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:02.950984image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:03.116332image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:03.268817image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:03.418428image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:03.578569image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:03.742312image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:04.034985image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:04.286415image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:04.508204image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:04.692843image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:04.858251image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:05.051994image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:05.199399image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:05.338740image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:05.470952image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:05.622344image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:05.768084image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:05.921226image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:06.073832image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:06.223261image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:06.367118image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:06.558953image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:06.740073image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:06.930860image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:07.112938image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:07.279153image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:07.510918image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:07.734888image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:07.932472image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:08.099129image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:08.260676image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:08.410328image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:08.566819image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:08.728087image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:09.403130image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:09.613256image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:09.797976image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:10.014004image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:10.275777image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:10.490664image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:10.694929image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:10.958771image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:11.159102image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:11.375841image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:11.541393image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:11.817985image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:12.099404image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-04-19T12:04:12.268433image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Correlations

2021-04-19T12:04:38.357493image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-19T12:04:38.912100image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-19T12:04:39.273570image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-19T12:04:39.901691image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-04-19T12:04:40.495627image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-04-19T12:04:13.067211image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-19T12:04:15.027087image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-04-19T12:04:15.665339image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Customer_IDmonths_as_customerageinsured_sexinsured_education_levelinsured_occupationinsured_hobbiesinsured_relationshipcapital-gainscapital-losspolicy_numberpolicy_bind_datepolicy_statepolicy_cslpolicy_deductableincident_locationincident_hour_of_the_daynumber_of_vehicles_involvedproperty_damagebodily_injuriespolicy_annual_premiumumbrella_limitinsured_zipincident_dateincident_typecollision_typeincident_severityauthorities_contactedincident_stateincident_citywitnessespolice_report_availableauto_makeauto_modelauto_year_c39total_claim_amount
0Customer_54123941FEMALEJDfarming-fishingpaintballother-relative51400-63007430922013-11-11OH250/50010006303 1st Drive221?01325.4470000004748982015-02-18Parked Car?Trivial DamagePoliceNCArlington2YESMercedesE4002013NaN14386.67
1Customer_44010831MALEMastersprotective-servyachtingnot-in-family004922242005-12-09IN500/100020005585 Washington Drive141NO01175.7006087672015-02-19Single Vehicle CollisionRear CollisionTotal LossFireNYColumbus2NODodgeNeon2006NaN76440.00
2Customer_48211630MALEJDhandlers-cleanersgolfnot-in-family0-355009962532001-11-29IN500/10005001328 Texas Lane83NO0951.4604672272015-01-31Multi-vehicle CollisionFront CollisionMajor DamageOtherWVRiverwood3?VolkswagenPassat2004NaN79560.00
3Customer_422821MALEHigh Schoolhandlers-cleanershikinghusband003550852012-10-09IN500/10005006117 4th Ave211?01021.9004642372015-02-05Single Vehicle CollisionFront CollisionMajor DamageOtherWVColumbus0?ToyotaCorolla2012NaN121680.00
4Customer_77816138MALEPhDpriv-house-servexercisenot-in-family6020001925242004-01-02IL100/30020002272 Embaracadero Drive03YES21133.8504398702015-01-03Multi-vehicle CollisionFront CollisionTotal LossPoliceWVSpringfield2YESVolkswagenJetta2003NaN80640.00
5Customer_94940755FEMALEPhDtech-supportbungie-jumpingwife0-577001932131996-03-11OH100/30010001806 Weaver Ridge03?21250.0850000004745982015-02-08Multi-vehicle CollisionSide CollisionTotal LossPoliceWVArlington3YESFordEscape2010NaN90880.00
6Customer_3349630MALECollegeprof-specialtyhikingwife38900-487004065672001-09-25OH100/3005009417 Tree Hwy221?01399.2760000004489132015-02-24Single Vehicle CollisionSide CollisionTotal LossFireNCArlington0YESFordEscape2004NaN71253.33
7Customer_57628246MALEMDother-servicedancingwife51100-751005026341991-08-17OH100/30020007954 Tree Ridge21?21558.8604508002015-02-17Single Vehicle CollisionFront CollisionMinor DamagePoliceNYSpringfield2NOBMWM52012NaN92533.33
8Customer_93414631FEMALECollegearmed-forcescampingown-child001498391990-09-21OH100/30010001110 4th Drive03NO11457.6550000006062192015-02-03Multi-vehicle CollisionRear CollisionMajor DamageAmbulanceVARiverwood3?ToyotaHighlander2010NaN69840.00
9Customer_56737154MALEHigh Schoolcraft-repairmovieswife34700-810004037762012-04-27IN100/30020006971 Best Ridge183?11317.9704698532015-01-18Multi-vehicle CollisionFront CollisionMajor DamageAmbulanceSCColumbus2?FordFusion2010NaN43040.00

Last rows

Customer_IDmonths_as_customerageinsured_sexinsured_education_levelinsured_occupationinsured_hobbiesinsured_relationshipcapital-gainscapital-losspolicy_numberpolicy_bind_datepolicy_statepolicy_cslpolicy_deductableincident_locationincident_hour_of_the_daynumber_of_vehicles_involvedproperty_damagebodily_injuriespolicy_annual_premiumumbrella_limitinsured_zipincident_dateincident_typecollision_typeincident_severityauthorities_contactedincident_stateincident_citywitnessespolice_report_availableauto_makeauto_modelauto_year_c39total_claim_amount
690Customer_12120636FEMALEMDother-servicevideo-gamesother-relative0-537002537912009-07-23IL500/10005002100 MLK St111NO21625.4540000006074522015-01-23Single Vehicle CollisionFront CollisionMajor DamageAmbulanceNYNorthbrook1NOFordFusion2008NaN102080.00
691Customer_61413133MALEMDsalesyachtingwife0-652004327401990-10-09IL100/30020003246 Britain Ridge31?01081.1704451202015-01-28Parked Car?Minor DamagePoliceNYNorthbend1NOToyotaCamry2010NaN6533.33
692Customer_2046062MALEJDother-servicebungie-jumpingown-child001834302002-06-25IN250/50010005380 Pine St203NO11187.9640000006188452015-01-01Multi-vehicle CollisionRear CollisionMinor DamagePoliceNYColumbus0?SuburuImpreza2011NaN62880.00
693Customer_7008531FEMALEMDtech-supportpaintballhusband008733842004-03-10IL250/50020007733 Britain Lane12NO21234.6990000006134712015-02-06Multi-vehicle CollisionFront CollisionMajor DamageOtherWVArlington1?BMWM52003NaN99200.00
694Customer_7122241FEMALEMDarmed-forcescross-fitnot-in-family37800-503002608451998-11-11OH100/30020006751 Pine Ridge71NO01055.5304419922015-02-08Single Vehicle CollisionFront CollisionTotal LossOtherWVNorthbrook2NOHondaCivic1995NaN81720.00
695Customer_10646461FEMALEAssociateprof-specialtybasketballhusband0-564006326271990-10-07OH500/100010004793 4th Ridge63?01125.3706044502015-01-13Multi-vehicle CollisionRear CollisionMajor DamagePoliceVANorthbend2YESSaab952000NaN106400.00
696Customer_27036955MALECollegehandlers-cleanerscampinghusband5540005778102013-04-15OH250/50020009373 Pine Hwy63?21589.5404447342015-01-27Multi-vehicle CollisionRear CollisionMinor DamagePoliceVAArlington0YESToyotaHighlander2003NaN113733.33
697Customer_86023042FEMALEMDadm-clericalgolfown-child0-453001759602004-11-16IN100/30010001589 Best Ave133NO11023.1104761302015-02-06Multi-vehicle CollisionRear CollisionMinor DamageOtherNYNorthbend2YESAccuraMDX1999NaN78466.67
698Customer_43510228MALEMDmachine-op-inspctreadingwife5520008101891999-08-29OH250/5005008021 Flute Ave61NO11075.4104456482015-02-15Single Vehicle CollisionSide CollisionTotal LossPolicePANorthbend0NODodgeNeon1996NaN97866.67
699Customer_10227941FEMALEJDprof-specialtybungie-jumpinghusband37300-317003892382001-06-06IL250/5005002199 Texas Drive163?21497.3504607422015-01-29Multi-vehicle CollisionFront CollisionMinor DamageFireNCNorthbrook3NOFordFusion2013NaN38400.00